Session 11: Prosody

نویسنده

  • Mari Ostendorf
چکیده

This paper provides a brief introduction to prosody research in the context of human-computer communication and an overview of the contributions of the papers in the session. In large part, prosody is "the relative temporal groupings of words and the relative prominence of certain syllables within these groupings" (Price and Hirsehherg [1]). This organization of the words, as Silverman points out [2], "annotates the information structure and discourse role of the text, and indicates to the listener how the speaker believes the content relates to the ...prior knowledge within the discourse context." For example, the relative groupings of words can provide cues to syntactic structure as well as discourse segmentation, and the relative prominence of words can provide cues to semantically important or focused items. Segmentation and focus represent two of the major uses of prosody, but other information may also be cued by intonation patterns, e.g. indication of continuation, finality or a yes-no question with phrase final "boundary tones". Prosody is typically also defined with a reference to its suprasegmental nature: "Prosody comprises all the sound attributes of a spoken utterance that are not a property of the individual phones" (Collier) [2]. In addition, prosody can operate at multiple levels (e.g., word, phrase, sentence, paragraph), making computational modeling of prosody particularly challenging. The acoustic correlates of prosody, which include duration of segments and pauses, fundamental frequency (F0), amplitude and vowel quality, may be influenced by prosodic patterns at more than one level, as well as inherent seg-mental properties. Modeling the interactions among the different factors is an important and difficult problem. Most current linguistic theories of prosody include an abstract or phonological representation of prosody to characterize aspects of phrasing, prominence, and intonation or melody. However, here we also see that abstract representations are of interest for computational modeling. Since it is generally agreed that prosody is not directly related to standard representations of syntactic structure , it is useful to have an intermediate representation to facilitate automatic learning and to simplify model structure. Thus, the form of an abstract representation is an important issue. Ideally, it should include all three main aspects of prosody, and address the needs of theory and computational models. Many different schemes have been proposed, and variations of two different prosodic transcription systems are used in the papers presented in this session. The TOBI (Tones and Break Indices) system for American English [3] is …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

9th ISCA Workshop on Speech Synthesis

s 10 Keynote Session 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Oral Session 1: Prosody. . . . . . . . . . . . . . . . . . . . . . . . . 12 Poster Session 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Keynote Session 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Oral Session 2: Deep Learning in Speech Synthesis . . . . . . . . . 25 Demo Session . . . ...

متن کامل

Session 13: Prosody

Tho aim of this introductory s~tion is to set the context for Session 13: Prosody. It will do so by defining some basic terms, by considering the status of current research on prosody, and by outlining the papers in the session and how they contribute to and complement previous work in the area. Prosody, perceptually, can be thought of as the relative temporal groupings of words and the relativ...

متن کامل

Social and Linguistic Speech Prosody

NOTE: page numbers refer to the digital form of the proceedings where full papers are included these can be downloaded from http://www.speechprosody2014.org/proceedings.pdf 1 Day One May 20th Tuesday Opening Session 1:30pm 2pm : 1-0-opening (3 bros:welcome!etc) 1.1 Tuesday Session One 2pm 3:30pm : 1-1-plenary (1+3 presentations)

متن کامل

Displaying prosodic text to enhance expressive oral reading

This study assessed the effectiveness of software designed to facilitate expressive oral reading through text manipulations that convey prosody. The software presented stories in standard (S) and manipulated formats corresponding to variations in fundamental frequency (F), intensity (I), duration (D), and combined cues (C) indicating modulation of pitch, loudness and length, respectively. Ten e...

متن کامل

The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes

Recently it has been argued that speakers use conventionalized forms to express different prosodic attitudes [1]. We examined this by looking at across speaker consistency in the expression of auditory and visual (head and face motion) prosodic attitudes produced on multiple different occasions. Specifically, we examined acoustic and motion profiles of a female and a male speaker expressing six...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993